DX12 Basics
A work-in-progress reference for some DX12 basics as I learn along. Check out Frank D. Luna’s books for an excellent and thorough introduction to the topic.
Conceptual Diagram
Executing Commands
An application submits commands to the GPU via a
CommandQueue
. Execution is asynchronous. The GPU idles if
the queue is empty, and the CPU stalls on submission if the queue is
full. A good application keeps both busy.
Commands are recorded in CommandList
s.
CommandList
s are submitted to the CommandQueue
via ExecuteCommandLists()
. CommandList
s are
executed in order.
A CommandList
must be Close()
d before it
can be executed.
Once a CommandList
has been executed, it can be
Reset()
and re-used to record a new set of commands.
Reset()
re-initializes the CommandList
and is
cheaper than destroying it and creating a new one.
Commands in a CommandList
are recorded into a
CommandAllocator
. This is the memory backing of commands.
Therefore the CommandAllocator
cannot be reset until the
GPU finishes executing the commands. This requires synchronization
(Fence
).
Once the GPU has finished executing a CommandList
, the
CommandList
’s CommandAllocator
can be
Reset()
to record new commands.
Multiple CommandList
s can be associated with the same
CommandAllocator
. However, only one of them can record at
the same time, and the others must be in a closed state. In essence,
commands are allocated contiguously in the CommandAllocator
while a CommandList
is recording.
When a CommandList
is created or Reset()
,
it defaults to an open state. It might be convenient to
Close()
it right away.
Resources and Descriptors/Views
DX12 (and Vulkan) decouples resources and descriptors. Descriptors are also known as views.
A Resource
is the texture or buffer data in memory.
A Descriptor
describes how the Resource
is
accessed in different stages of the graphics pipeline. For example, a
render target view (RTV
) draws into a texture. A shader
resource view (SRV
) allows a shader to read from a texture.
A Descriptor
can also map to a subregion of the
Resource
and reinterpret the type of the data elements (for
typeless resources).
If a Resource
is typeless, then the
Descriptor
must specify a type. Typed
Resource
s are best for performance; use typeless
Resource
s only when strictly necessary.
Descriptor
creation incurs some validation overhead.
Create them during initialization if possible.
Types of Descriptors
CBV
: constant buffer view, for reading constant buffer data.SRV
: shader resource view, for reading textures.UAV
: unordered access view, to read/write texture and buffer data.Sampler
: to sample textures via theirSRV
s.RTV
: render target view, to render into textures.DSV
: depth/stencil view, to describe depth/stencil buffers.
Descriptor Heaps
Descriptor
s are allocated from a
DescriptorHeap
. A DescriptorHeap
is the memory
backing for a type of Descriptor
. An application will need
at least one DescriptorHeap
for each type of
Descriptor
used. Multiple DescriptorHeap
s of
the same type can also exist.
Resource Heaps
Resources are also allocated in heaps. When creating a resource
(CreateCommittedResource()
), we must specify the desired
heap type:
Default heap
: for resources exclusively accessed by the GPU.Upload heap
: for resources that require data uploads from the CPU to the GPU.Readback heap
: for resources that need to be read back by the CPU.
Synchronization
CPU-GPU Synchronization
Fence
s are used for CPU-GPU synchronization.
The example above shows how to safely Reset()
a
CommandAllocator
by making sure that all commands in the
CommandQueue
backed by the CommandAllocator
have been executed on the GPU.
To establish a synchronization point, the CPU calls
CommandQueue::Signal()
on a Fence
and with a
given fence value. When the GPU reaches the synchronization point, it
signals the CPU by setting the Fence
to the given value.
Typically this value can be incremented by one every time a new
synchronization point is established.
The CPU can check the Fence
value in two ways. One is to
call Fence::GetCompletedValue()
, which is non-blocking. The
other way is blocking: create a Windows event object, call
SetEventOnCompletion()
, then
WaitForSingleObject()
; the calling thread is put to sleep
until the GPU signals the Fence
.
GPU Workload Synchronization
Unlike OpenGL and previous versions of DirectX, applications also
need to manage GPU workload synchronization. For example, if shader A
writes to a texture through an RTV
or UAV
and
shader B reads from it through an SRV
or UAV
,
then a synchronization point must be established to prevent a resource
hazard.
CommandList::ResourceBarrier()
establishes a
synchronization point between GPU workloads. Two common types of
barriers are:
Resource Transition Barrier
: declares a transition in a resource’s usage.UAV Barrier
: declares that all current UAV accesses to a resource must complete before future accesses can begin.
Resources are associated with a usage or state
that defines how a resource is used. A
Resource Transition Barrier
declares a change in a
resource’s state. The GPU then inserts synchronization points when it
encounters barriers to prevent resource hazards.
For example, when beginning a new frame, the previous frame’s front
buffer becomes the current frame’s back buffer. Before we can render to
this resource in the current frame, we must transition it from
D3D12_RESOURCE_STATE_PRESENT
to
D3D12_RESOURCE_STATE_RENDER_TARGET
. Then, once we have
rendered the current frame and are ready to Present()
, we
perform another transition from
D3D12_RESOURCE_STATE_RENDER_TARGET
back to
D3D12_RESOURCE_STATE_PRESENT
.
A UAV Barrier
, on the other hand, synchronizes access to
a UAV
. This is typically needed when shader A writes to a
UAV
that shader B then reads from, or when two shaders
write to a given UAV
. In the first case, A must finish work
before B can start executing. In the latter, a barrier is needed to
guarantee write order unless we can gurantee that the shaders write to
different parts of the UAV.
Uploading Buffer Data
Buffers should be placed in the default heap for best performance.
However, resources on the default heap are not CPU-writeable. To upload
buffer data, the application must instead create a buffer on the upload
heap (“upload buffer”), upload data to that buffer, and then copy the
upload buffer into the original buffer with CopyResource()
or CopyBufferRegion()
.
The upload buffer cannot be released or re-used until the GPU
finishes executing the CopyResource()
or
CopyBufferRegion()
command. This requires CPU-GPU
synchronization as usual.
The target buffer must also undergo the appropriate resource transitions during the transfer.
Diagrams rendered with PlantUML.